Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of Iceberg remove_orphan_files #14383

Merged
merged 6 commits into from
Oct 17, 2022

Conversation

alexjo2144
Copy link
Member

@alexjo2144 alexjo2144 commented Sep 29, 2022

Description

Avoid reading manifests multiple times during the remove_orphan_files valid file set construction.

Fixes: #13691

Non-technical explanation

Reduce the file I/O required during remove_orphan_files

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Iceberg
* Improve Iceberg `remove_orphan_files` performance. ({issue}`13691`)

@cla-bot cla-bot bot added the cla-signed label Sep 29, 2022
@alexjo2144 alexjo2144 requested review from findepi, homar and erichwang and removed request for findepi September 29, 2022 20:43
Comment on lines +409 to +408
.addCopies(new FileOperation(MANIFEST, INPUT_FILE_GET_LENGTH), 6)
.addCopies(new FileOperation(MANIFEST, INPUT_FILE_NEW_STREAM), 6)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're still reading each manifest twice, I'll take a look at that tomorrow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that first every manifest is read here:

for (ManifestFile manifest : snapshot.allManifests(table.io())) 

and then again here:

ManifestFiles.read(manifest, table.io());
ManifestFiles.readDeleteManifest(manifest, table.io(), table.specs());

but this is still huge improvement as previously these were the numbers:

.addCopies(new FileOperation(SNAPSHOT, INPUT_FILE_GET_LENGTH), 4)
.addCopies(new FileOperation(SNAPSHOT, INPUT_FILE_NEW_STREAM), 4)
.addCopies(new FileOperation(MANIFEST, INPUT_FILE_GET_LENGTH), 24)
.addCopies(new FileOperation(MANIFEST, INPUT_FILE_NEW_STREAM), 24)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's part of how the ManifestReader works. The factory ManifestFiles.read opens the file once to read the metadata, and then calling iterator opens the file again. I don't think there's much we can do about that

@@ -384,10 +385,41 @@ public void testPredicateWithVarcharCastToDate()
assertUpdate("DROP TABLE test_varchar_as_date_predicate");
}

@Test
public void testRemoveOrphanFiles()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you planning on adding a test for snapshot expiration as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can, probably in a separate PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this might be covered by @homar 's #14434

}
}

validMetadataFilePaths.addAll(metadataFileLocations(table, false));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What else does it add beyond what we already collected ourselves?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This add an optional list of things from the metadata-log,

A list (optional) of timestamp and metadata file location pairs that encodes changes to the previous metadata files for the table. Each time a new metadata file is created, a new entry of the previous metadata file location should be added to the list. Tables can be configured to remove oldest metadata log entries and keep a fixed-size log of the most recent entries after a commit.
--


<br class="Apple-interchange-newline">

}

validMetadataFilePaths.addAll(metadataFileLocations(table, false));
validMetadataFilePaths.add(versionHintLocation(table));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReachableFileUtil could be a nice help here

in apache/iceberg#5795 it seems there is no single central place in Iceberg where ReachableFileUtil is used; rather individual helper method calls are scattered across codebase.
wonder if there is some reuse opportunity still

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think fixing it up I'd rather move this method into the Iceberg lib and have it defined once for Spark and Trino

validMetadataFilePaths.add(versionHintLocation(table));

scanAndDeleteInvalidFiles(table, session, schemaTableName, expireTimestamp, union(validDataFilePaths.build(), validDeleteFilePaths.build()), "data");
scanAndDeleteInvalidFiles(table, session, schemaTableName, expireTimestamp, validMetadataFilePaths, "metadata");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it an error to have a data file in a metadata directory (or vice versa)?
is table corruption a fair punishment?

i guess this is no behavioral change here in this PR, so no change requested. But also, since you collect full paths, it seems easy to change if we want to

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to leave it for now, but if we support more of the Iceberg write location properties / LocationProvider implementations we'll need to change this.

@@ -5256,11 +5256,14 @@ public void testRemoveOrphanFiles()
Session sessionWithShortRetentionUnlocked = prepareCleanUpSession();
assertUpdate("CREATE TABLE " + tableName + " (key varchar, value integer)");
assertUpdate("INSERT INTO " + tableName + " VALUES ('one', 1)", 1);
assertUpdate("INSERT INTO " + tableName + " VALUES ('two', 2), ('three', 3)", 2);
assertUpdate("DELETE FROM " + tableName + " WHERE key = 'two'", 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change existing test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't exercise delete files, and also didn't validate that we didn't corrupt the table. Seemed like reasonable additions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Let's have it as a separate commit, so that we can capture the rationale in the commit message.

otherwise it looks like a left-over from local testing

@alexjo2144 alexjo2144 force-pushed the iceberg/remove-orphan-files branch from 0016e30 to 538cfef Compare September 30, 2022 14:43
@alexjo2144 alexjo2144 requested a review from findepi September 30, 2022 14:49
@alexjo2144 alexjo2144 force-pushed the iceberg/remove-orphan-files branch 2 times, most recently from e1efed2 to a9afb2b Compare September 30, 2022 15:07
@alexjo2144 alexjo2144 force-pushed the iceberg/remove-orphan-files branch from a9afb2b to 58e4b91 Compare September 30, 2022 20:30
@findepi
Copy link
Member

findepi commented Oct 3, 2022

Is testCleaningUpWithTableWithSpecifiedLocation failure related?

@alexjo2144
Copy link
Member Author

Is testCleaningUpWithTableWithSpecifiedLocation failure related?

Yes. Looks like the change I added to use full paths doesn't handle trailing slashes well

@alexjo2144 alexjo2144 force-pushed the iceberg/remove-orphan-files branch from 93f8632 to dd57760 Compare October 3, 2022 15:03
@alexjo2144
Copy link
Member Author

I switched back to using file names rather than full paths

@alexjo2144 alexjo2144 force-pushed the iceberg/remove-orphan-files branch from dd57760 to 88dd805 Compare October 4, 2022 18:58
@alexjo2144 alexjo2144 requested a review from findepi October 6, 2022 17:09
@alexjo2144 alexjo2144 force-pushed the iceberg/remove-orphan-files branch from 4306403 to 7750214 Compare October 7, 2022 14:52
@alexjo2144
Copy link
Member Author

Addressed, thanks @findepi

@alexjo2144 alexjo2144 requested a review from findepi October 7, 2022 14:52
}

for (ManifestFile manifest : snapshot.allManifests(table.io())) {
if (!processedManifestFilePaths.add(manifest.path())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to do

validMetadataFileNames.add(fileName(manifest.path())

do we not need to populate validMetadataFileNames?
is it because metadataFileLocations(table, false).stream() will do this for us?
if so, maybe we don't need validMetadataFileNames.add(fileName(snapshot.manifestListLocation())); either?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I get for thinking, I don't need to rerun the tests it was just a small refactor 😦

Comment on lines 1241 to 1242
// Using file names may result in retaining files which could be removed. However, in practice Iceberg metadata
// and data files have UUIDs in their names which makes this unlikely.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for capturing this reasoning.
Can you also add a note why not using full paths in these sets?

@alexjo2144 alexjo2144 force-pushed the iceberg/remove-orphan-files branch from 7750214 to 7ce967d Compare October 7, 2022 16:29
@alexjo2144 alexjo2144 force-pushed the iceberg/remove-orphan-files branch from 7ce967d to 3e50c1e Compare October 7, 2022 20:17
Set<String> processedManifestFilePaths = new HashSet<>();
// Using file names may result in retaining files which could be removed. However, in practice Iceberg metadata
// and data files have UUIDs in their names which makes this unlikely. Collecting file names here rather than paths
// makes this comparison easier when checking if files should be retained.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this explanation.
I don't quiet get it yet. Intuitively, it's as easy to compare paths as it's to compare file names.
What problem did you face where you tried that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testCleaningUpWithTableWithSpecifiedLocation was failing because /a/slash.txt vs /a//slash.txt resolve to the same file but have different paths.

I'll update the comment to address this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because /a/slash.txt vs /a//slash.txt resolve to the same file but have different paths.

But only one is "correct".

I think i understand. The metadata may contain entries with double slashes. If we're running on a file system that normalizes repeated slashes (like .. not S3), the table works. To avoid breaking it here, we cannot compare paths.

@alexjo2144 alexjo2144 force-pushed the iceberg/remove-orphan-files branch from 3e50c1e to 918ad53 Compare October 11, 2022 18:15
Add testing for deletes and ensure the operation
does not corrupt the table.
Avoid reading the Avro manifest files multiple times
while building the set of live files.
@findepi findepi force-pushed the iceberg/remove-orphan-files branch from 918ad53 to ee36f85 Compare October 12, 2022 06:52
@findepi
Copy link
Member

findepi commented Oct 12, 2022

rebased

@findepi
Copy link
Member

findepi commented Oct 12, 2022

Cassandra tests red.
@nineinchnick why were they run at all?
is it because suites are shared between iceberg and cassandra?

@nineinchnick
Copy link
Member

Cassandra tests red. @nineinchnick why were they run at all? is it because suites are shared between iceberg and Cassandra?

We pass impacted features to the product test launcher and it should be able to skip only some tests in a suite, depending on connectors (a kind of feature) used per environment.

But impact analysis was disabled, you can tell by looking at the build-pt job: https://github.com/trinodb/trino/actions/runs/3232626348/jobs/5293468407

image

I'm not sure why, but that failed job doesn't match what's currently in this branch. There were changes in tests:

WARNING: Impacted modules list includes non-plugin modules, ignoring it: [testing/trino-tests, testing/trino-faulttolerant-tests]

@findepi
Copy link
Member

findepi commented Oct 13, 2022

I'm not sure why, but that failed job doesn't match what's currently in this branch. There were changes in tests:

WARNING: Impacted modules list includes non-plugin modules, ignoring it: [testing/trino-tests, testing/trino-faulttolerant-tests]

@nineinchnick Can it be these are reported as "impacted" because both these test modules depend on trino-iceberg, which is being changed in this PR?

@nineinchnick
Copy link
Member

I'm not sure why, but that failed job doesn't match what's currently in this branch. There were changes in tests:

WARNING: Impacted modules list includes non-plugin modules, ignoring it: [testing/trino-tests, testing/trino-faulttolerant-tests]

@nineinchnick Can it be these are reported as "impacted" because both these test modules depend on trino-iceberg, which is being changed in this PR?

Yes, the list of impacted modules should only contain modules with actual changes in them and their dependants, not their dependencies. I also think I made a typo in #14541, using github.event.name instead of github.event_name, which causes all PTs to always be scheduled. I tested this just now in #12854

@findepi findepi merged commit 2daf169 into trinodb:master Oct 17, 2022
@findepi findepi mentioned this pull request Oct 17, 2022
@github-actions github-actions bot added this to the 401 milestone Oct 17, 2022
@alexjo2144 alexjo2144 deleted the iceberg/remove-orphan-files branch October 28, 2022 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Iceberg connector remove_orphan_files table procedure has O(n^2) runtime
4 participants